Software Construction and Analysis Tools for Future Space Missions
نویسنده
چکیده
NASA and its international partners will increasingly depend on software-based systems to implement advanced functions for future space missions, such as Martian rovers that autonomously navigate long distances exploring geographic features formed by surface water early in the planet's history. The software-based functions for these missions will need to be robust and highly reliable, raising significant challenges in the context of recent Mars mission failures attributed to software faults. After reviewing these challenges, this paper describes tools that have been developed at NASA Ames that could contribute to meeting these challenges: 1) Program synthesis tools based on automated inference that generate documentation for manual review and annotations for automated certification. 2) Model-checking tools for concurrent object-oriented software that achieve scalability through synergy with program abstraction and static analysis tools. This paper consists of five sections. The first section describes advanced capabilities needed by NASA for future missions that are expected to be implemented in software. The second section describes the risk factors associated with complex software in aerospace missions. To make these risk factors concrete, some of the recent softwarerelated mission failures are summarized. There is a considerable gap between current technology for addressing the risk factors associated with complex software and the future needs of NASA. The third section develops a model of this gap, and suggests approaches to close this gap through software tool development. The fourth section summarizes research at NASA Ames towards program synthesis tools that generate certifiable code. The fifth section summarizes research at NASA Ames towards software model-checking tools. 1. Software: Enabling Technology for Future NASA Missions NASA's strategic plan envisions ambitious missions in the next forty years that will project a major human presence into space. Missions being studied and planned include sample returns from comets, asteroids, and planets; detection of Earth-like planets around other stars; the search for the existence of life outside the Earth, intensive study of Earth ecosystems, and the human exploration of Mars. A major enabling factor for these missions is expected to be advanced software and computing systems. This section describes some of the requirements for these mission capabilities. Autonomous Spacecraft and Rovers. NASA's mission of deep space exploration has provided the requirement for one of the most stressing applications facing the computer science research community — that of designing, building, and operating progressively more capable autonomous spacecraft, rovers, airplanes, and perhaps even submarines. NASA is planning to fill space with robotic craft to explore the universe beyond in ways never before possible. These surrogate explorers need to be adaptable and self-reliant in harsh and unpredictable environments. Uncertainty about hazardous terrain and the great distances from Earth will require that the rovers be able to navigate and maneuver autonomously over a wide variety of surfaces to independently perform science tasks. Robotic vehicles will need to become progressively smarter and independent as they continue to explore Mars and beyond. In essence, robust autonomy software needs to be highly responsive to the environment of the robotic vehicle, without the constant intervention and guidance from Earth-based human controllers. In the case of Martian rovers, in the past Earth controllers would up-link commands each Martian day for limited maneuvers (e.g., roll ten meters forward northeast), which would be executed blindly by the rover. In the future, the commands will be for much more extensive maneuvers (e.g., navigate a kilometer towards a rock formation that is beyond the horizon) that require complex navigation skills to be executed autonomously by the rover, with constant adaptation to terrain and other factors. Such autonomy software, running in conjunction with an unknown environment, will have orders of magnitude more possible execution paths and behaviors than today's software. In addition to autonomy for commanding and selfdiagnosis, there is an increasing need for an autonomous or semi-autonomous on-board science capability. Deep space probes and rovers send data back to Earth at a very slow rate, limiting the ability of the space science community to fully exploit the presence of our machines on distant planets. There is a strong need for spacecraft to have the capacity to do some science processing on-board in an autonomous or semi-autonomous fashion. Human Exploration of Space. A human mission to Mars will be qualitatively more complex than the Apollo missions to the moon. The orbital dynamics of the MarsEarth combination means that low-energy (and hence reasonable cost) Mars missions will last two orders of magnitude longer than the Moon missions of the sixties and seventies specifically, on the order of five hundred days. To achieve science returns commensurate with the cost of a human Mars mission, the scientist-astronauts will need to be freed from the usual role of handyman and lab technician. They will need to have robotic assistants that support both the scientific aspects of the mission and also maintain the equipment and habitat. A particularly interesting issue that arises is that as spacecraft systems become increasingly capable of independent initiative, then the problem of how the human crew and the autonomous systems will interact in these mixed-initiative environments becomes of central importance. The emerging area of Human-Centered Computing represents a significant shift in thinking about information technology in general, and about smart machines in particular. It embodies a systems view in which the interplay between human thought and action and technological systems are understood as inextricably linked and equally important aspects of analysis, design, and evaluation. Developing and verifying software for mixed-initiative systems is very challenging, perhaps more so than for completely autonomous software. In contrast to the current human command/software executes blindly paradigm, mixed-initiative software has far more potential execution paths that depend on a continuous stream of human inputs. In this paradigm, the human becomes a complex aspect of the environment in which the software is executing, much more complex than the terrain encountered by a Martian rover. Furthermore, from the human viewpoint, mixedinitiative software needs to be understandable and predictable to the humans interacting with it. Today's methods for developing and verifying high-assurance mixed initiative software are woefully inadequate. For example, aviation autopilot and flight-management systems behave in ways that are often bewildering and unpredictable to human pilots. Even though they decrease the manual workload of human pilots, they increase the cognitive workload. Automation surprises have been implicated in a number of aviation fatalities. For a mixed human/robotic mission to Mars, the robotic assistants need to be both smart and well-behaved. 2. Aerospace Software Risk Factors While advances in software technology could enable future mission capabilities at substantially reduced operational cost, there are concerns with being able to design and implement such complex software systems in a reliable and cost-effective matter. Traditional space missions even without advanced software technology are already inherently risky. Charles Perrow’s book [1] identifies two risk dimensions for highrisk technologies: interactions and coupling. Complex interactions are those of unfamiliar or unexpected sequences, and are not immediately comprehensible. Systems that are tightly coupled have multiple time-dependent processes that cannot be delayed or extended. Perrow identifies space missions as having both characteristics; hence space missions are in the riskiest category. The risks that software errors pose to space missions are considerable. Peter Neumann’s book [2] catalogues computer-related problems that have occurred in both manned and unmanned space missions. Given the risks already inherent with today’s software technology, flight project managers are understandably reluctant to risk a science mission on new unproved information technologies, even if they promise cost savings or enhanced mission capabilities. This creates a hurdle in deploying new technologies, since it is difficult to get them incorporated on their first flight for flight qualification. NASA is addressing this hurdle through flight qualification programs for new technology such as New Millennium. However, flight project managers also need to be convinced that any information technology can be verified and validated in the specific context of their mission. This poses a special challenge to advanced software technology, since traditional testing approaches to V&V do not scale by themselves. This section next reviews several software errors that have had significant impact on recent space missions, in order to draw historical lessons on the difference between software failures and hardware failures. Ariane 501. The first launch of Ariane 5 Flight 501 ended in a disaster that was caused by a chain of events originating in the inappropriate reuse of a component in Ariane 4’s inertial reference frame software, and the lack of sufficient documentation describing the operating constraints of the software. Approximately 40 seconds after launch initiation, an error occurred when an unprotected conversion from a 64-bit floating point to a 16-bit signed integer value overflowed. This error occurred both in the active and backup system. The overflow of the value, related to horizontal velocity, was due to the much greater horizontal velocity of the Ariane 5 trajectory as compared to the Ariane 4 trajectory. This error was interpreted as flight data and led to swiveling to the extreme position of the nozzles, and shortly thereafter to selfdestruction. The full configuration of the flight control system was not analyzed or tested adequately during the Ariane 5 development program. The horizontal velocity value was actually critical only prior to launch, and hence the software was not considered flight critical after the rocket left the launch pad. However, in the case of a launch delayed near time zero, it could take a significant period for the measurements and calculations to converge if they needed to be restarted. To avoid the potential situation where a delayed launch was further delayed due to the need to recompute this value, the calculation of this value continued into the early stages of flight. Like many accidents, what is of interest is not the particular chain of events but rather the failure to prevent this accident at the many levels the chain could have been intercepted: 1) The development organization did not perform adequate V&V. 2) Software reuse is often seen as a means of cutting costs and ensuring safety because the software has already been ‘proven’. However, software which works adequately in one context can fail in another context. 3) As stated in the accident review report [3], there was a ‘culture within the Ariane programme of only addressing random hardware failures’, and thus duplicate back-up systems were seen as adequate failure-handling mechanisms. Software failures are due to design errors, hence failure of an active system is highly correlated with failure of a duplicate backup system. 4) Real-time performance concerns, particularly for slower flight-qualified computers, can lead to removal of software protection mechanisms that are known to work; in this case the protection for the floating point conversion. The board of inquiry concluded that: “software is an expression of a highly detailed design and does not fail in the same sense as a mechanical system. Software is flexible and expressive and thus encourages highly demanding requirements, which in turn lead to complex implementations which are difficult to access.” The fact that this software worked without error on Ariane 4, and was not critical after the rocket left the launch pad, contributed to overlooking this problem. Mars Pathfinder. Today’s aerospace software is increasingly complex, with many processes active concurrently. The subtle interactions of concurrent software are particularly difficult to debug, and even extensive testing can fail to expose subtle timing bugs that arise later during the mission. In the July 1997 Mars Pathfinder mission, an anomaly was manifested by infrequent, mysterious, unexplained system resets experienced by the Rover, which caused loss of science data. The problem was ultimately determined to be a priority inversion bug in simultaneously executing processes. Specifically, an interrupt to wake up the communications process could occur while the high priority bus management process was waiting for the low priority meteorological process to complete. The communication process then blocked the high priority bus management process from running for a duration exceeding the period for a watchdog timer, leading to a system reset. It was judged after-the-fact that this anomaly would be impossible to detect with black box testing. It is noteworthy that a decision had been made not to perform the proper priority inheritance algorithm in the high-priority bus management process because it executed frequently and was time critical, and hence the engineer wanted to optimize performance. It is in such situations where correctness is particularly essential, even at the cost of additional cycles. Mars Climate Orbiter and Mars Polar Lander. In 1998 NASA launched two Mars missions. Unfortunately, both were lost, for software-related reasons. The Mars Climate Orbiter was lost due to a navigation problem following an error in physical units, most likely resulting in the spacecraft burning up in the Martian atmosphere rather than inserting itself into an orbit around Mars. An onboard calculation measured engine thrust in foot-pounds, as specified by the engine manufacturer. This thrust was interpreted by another program on the ground in Newton-meters, as specified by the requirements document. Similar to Ariane 501, the onboard software was not given sufficient scrutiny, in part because on a previous mission the particular onboard calculations were for informational purposes only. It was not appreciated that on this mission the calculations had become critical inputs to the navigation process. The ground-based navigation team was overloaded, and an unfortunate alignment of geometry hid the accumulating navigation error until it was too late. The Mars Polar Lander was most probably lost due to premature shutdown of the descent engine, following an unanticipated premature signal from the touchdown sensors. The spacecraft has three different sequential control modes leading up to landing on the Martian surface: entry, descent and landing. The entry phase is driven by timing: rockets firings and other actions are performed at specific time intervals to get the spacecraft into the atmosphere. The descent phase is driven by a radar altimeter: the spacecraft descends under parachute and rocket control. At thirty meters above the surface the altimeter is no longer reliable, so the spacecraft transitions to the landing phase, in which the spacecraft awaits the jolt of the ground on one of its three legs; that jolt sets off a sensor which signals the engines to turn off. Unfortunately, the spacecraft designers did not realize that the legs bounce when they are unfolded at an altitude of 1.5km, and this jolt can set off the touchdown sensors which latch a software variable. When the spacecraft enters the landing phase at 30 m, and the software starts polling the flag, it will find it already set, and shut off the engines at that point. The resulting fall would be enough to fatally damage the spacecraft. Lessons from Software Failures during Space Missions. 1) Software failures are latent design errors, and hence are very different from hardware failures. Strategies for mitigating hardware failures, such as duplicative redundancy, are unlikely to work for software. 2) The complexity of aerospace software today precludes anything approaching ‘complete’ testing coverage of a software system. Especially difficult to test are the subtle interactions between multiple processes and different subsystems. 3) Performance optimizations resulting in removal of mechanisms for runtime protection from software faults (e.g., removal of Ariane 5 arithmetic overflow handler for horizontal velocity variable), even when done very carefully, have often led to failures when the fault arises in unanticipated ways. 4) Reuse of ‘qualified’ software components in slightly different contexts is not necessarily safe. The safe performance of mechanical components can be predicted based on a well-defined envelope encompassing the parameters in which the component successfully operated in previous space missions. Software components do not behave linearly, nor even as a convex function, so the notion of a safe operating envelope is fundamentally mistaken. Although the missions beyond the next ten years are still conceptual, plans for the next ten years are reasonably well defined. Sometime in the next decade, most likely 2009, NASA plans to launch a robot mission that will capture a sample of Martian soil, rocks, and atmosphere and return it to Earth. The software for this mission could be 100 times more complex than for the Mars Climate Orbiter. The software for missions beyond this 2009 Mars sample return, requiring the capabilities described in the first section of this paper, will be even more complex. The next section of this paper presents a framework for assessing the likelihood of success for these missions if current trends continue, and the potential for software construction and analysis tools to reverse these trends. 3. A Model for Software Reliability versus Software Complexity The aerospace industry, like most other industries, is seeing an increasing importance in the role played by software: the amount of software in a mission is steadily increasing over time. This has delivered substantial benefits in mission capabilities. Software is also comparatively easy to change to adapt to changing requirements, and software can even be changed after launch, making it an especially versatile means of achieving mission goals. The following table provides historical data from a small number of space missions, and gives flight software in thousands of lines of source code. Note that while Cassini (a mission to Saturn that will be in orbit around Saturn in 2004) and Mars Pathfinder launched in the same year, development of Cassini started many years earlier. The data clearly indicates an exponential growth over time in the size of flight software. This exponential growth is consistent with other sectors of aerospace including civilian aviation and military aerospace. In a subsequent graph we will use a log scale for thousands of line of source code versus a log scale for expected number of mission-critical software errors to extrapolate a model for expected software reliability, and the potential impact of various kinds of tools. Mission Launch Year Thousands SLOC
منابع مشابه
An Analysis of the Missions and Goals, Content, Tools, and Functions of Faculty development Centers in World-Class Universities: A Comparative Comparison
Introduction: The purpose of this study was to identify the missions and goals, content, tools, and functions of faculty development centers in world-class universities. Method: This study was conducted using qualitative approach and comparative comparison method and content analysis. Data were collected using the Times 2020 ranking site and sites of centers at world-class universities rated be...
متن کاملIntroducing a Lightweight Structural Model via Simulation of Vernacular “Pa Tu Pa” Arch
The knowledge of Iranian vernacular structures is based on geometry, and there is a possibility of recreating such structural patterns aimed at producing movable structures. The purpose of this research was to utilize the patterns of vernacular structures to provide a lightweight structural model. The questions raised included how to create various forms based on the structural history of any r...
متن کاملDeep space missions and the issue of overcoming the problem of space radiation
As a member of the United Nations Committee on the Peaceful Uses of Outer Space (COPUOS), Iran has a long-term space exploration program. Space radiation is one of the challenges facing humans when they go outside Earth's protective atmosphere and magnetic field. Space is an environment that the cardinal principles of radiation protection i.e. time, distance and shielding cannot be effectively ...
متن کاملI The Application of Beowulf-Class Computing to Computational Electromagnetic
Current computational developments at the Jet Propulsion Laboratory (JPL) are motivated by the NASA/JPL goal of reducing payload in future space missions while increasing mission capability through miniaturization of active and passive sensors, analytical instruments and communication systems. Typical system requirements include the detection of particular spectral lines or bands, associated da...
متن کاملUtilization of Intelligent Systems Technologies for Manned Mission Operations Support
With the International Space Station being extended to 2020, there is additional emphasis in the manned spaceflight program to find more efficient and effective ways of providing the ground-based mission support. This search for improvement has led to a cross-fertilization between the advanced software development community and the manned spaceflight operations community. Many mission operation...
متن کاملRee: a Cots-based Fault Tolerant Parallel Processing Supercomputer for Spacecraft Onboard Scientific Data Analysis
NASA’s future spaceborne science missions will require supercomputing capabilities for both near earth and deep space exploration. Limited downlink bandwidth and excessive round trip communication delays limit the capabilities and science value of missions which rely on terrestrial supercomputing resources. Projects such as the Gamma ray Large Area Space Telescope (GLAST), the Next Generation S...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002